About

This notebook is to facilitate collaboration on building and evaluating models to estimate pharmaceutical product expiration date. The notebook contains:

  • An example reaction mechanism that can be used to *simulate* theoretical data.
  • Several models to predict shelf life based on various assumptions.

To compare models:

  1. Set the theoretical, experimental parameters, and acceptance parameters in indicated area below.
  2. Run the simulation and models *Cell-> Run All*
  3. The model compairsons are updated and summarized in graphs and tables in the notebook.

To contribute, please make a copy of the notebook (File -> Make a Copy), and rename. The following modifications can be made.</b>

  • Plain english comments can be added by clicking on and existing cell and typing, or creating a new cell (*Cell -> Insert*) then changing the cell contents to markdown (*Cell -> Cell Type -> Markdown*).
  • A new model can be added by going to the bottom and appending the code.
  • The reaction mechanism can be changed by modifying the integration cell below.

For general help on using jupyter notebooks, see the jupyter home page
For more information on the impurity prediction project see the project github page.

Example degredation reaction with two pathways

In this example the impurity product, $P$, can either be produced from crystalline drug, $D$, or an amorphous intermediate, $I$, that is more reactive (this example is extracted from Waterman et al. Pharm Res Vol 24 2007 p 783). \begin{eqnarray} \frac{dP}{dt} = k_3 D + k_2 I \\ \frac{d I}{dt} = k_{1f} D - k_{1r} I - k_2 I \\ \frac{d D}{dt} = - k_{1f} D + k_{1r} I - k_3 D \end{eqnarray}

Expressing the system in matrix form. \begin{equation} \frac{d}{dt} \begin{bmatrix} P \\ I \\ D \end{bmatrix} = \begin{bmatrix} 0 & k_2 & k_3 \\ 0 & (-k_{1r} - k_2) & k_{1f} \\ 0 & k_{1r} & (-k_{1f}-k_3) \end{bmatrix} \begin{bmatrix} P \\ I \\ D \end{bmatrix} \end{equation}

The rate constants are assumed to follow an Arrhenius temperature dependence: \begin{equation} k = A \exp \left (- \frac{E}{RT} \right ) \end{equation}

Edit the parameters below for the above reaction model.
A : refers to the Arrhenius prefactor ($\textrm{sec}^{-1}$). The subfix is defined by the above model.
E : refers to the Arrhenius activation energy (kcal/mol). The subfix is defined by the above model.
Po, Io, Do : refer to the initial product $P$, intermediate, $I$ and drug, $D$, component concentrations in the above model. The concentrations are defined as the fraction of component relative to the total amount of starting reactant. For example, if the starting formulation has 90% of the drug as crystalline, and 10% amorphous, then the ratios would be $D_o$=0.9, $I_o$=0.1, and $P_o$=0.
Temperatures: refers to the stability testing temperatures.
days: refers to the time points that each sample is measured.
impurity_setpoint: defines the acceptance criteria for the impurity concentration

When finished run menu Cell -> Run Cells. Scroll down to see the update results.


In [1]:
# kinetic parameters (kcal/mol)
A1f = 1e4
E1f = 22

A1r = 1e4
E1r = 26

A2 = 1e6
E2 = 20

A3 = 1e5
E3 = 21

Po = 0
Io = .1
Do = 0.9

# temperatures (up to 4 different temperatures)
Temperatures = [25, 40, 60, 80]

# time points in days
days = [[0, 7, 14], # days at first temperature
        [0, 5, 10], # days at second temperature
        [0, 3, 7], # days at third temperature
        [0, 1, 5]] # days at fourth temperature

# acceptance criteria for impurity concentraiont in fraction of product degredation
impurity_setpoint = 0.05

Here are the main Python imports for solving the ODEs, plotting and other data analysis. Import modules as needed.


In [2]:
# Python module imports
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
from scipy.integrate import odeint
from IPython.display import Image
from IPython.core.display import HTML 
from scipy.optimize import minimize
from scipy.optimize import curve_fit
import statsmodels.formula.api as sm
%matplotlib inline
from pdb import set_trace

plt.rcParams.update({'figure.figsize':(8,6),
                     'figure.titlesize': 16.,
                     'axes.labelsize':14., 
                     'xtick.labelsize':12., 
                     'legend.fontsize': 12.})

The following numerically integrates the reaction rate to give the exact impurity profile. Nothing needs to be modified here by the user. Scroll down to see calculation outputs and model comparisons.


In [3]:
# gas constant kcal/mol K
R = 0.00198588

# define the domain for the ODEs solution
ndays = np.max(days) + 365*10
dt = 5000.
t = np.arange(0, ndays*(24*3600), dt)
npts = t.shape[0]

# differential equation solution variable
cols = [(T, C) for T in Temperatures for C in ['P','I','D']]
concentrations = pd.DataFrame({col: np.zeros(t.shape[0]) for col in cols})
concentrations['t'] = t

# experimental results
iterables = [(T, d) for i, T in enumerate(Temperatures) for d in days[i]]
index = pd.MultiIndex.from_tuples(iterables, names=['Temperature', 'days'])
results = pd.DataFrame(data=np.zeros(12), index=index, columns=['P'])

# predicted times
predictions = pd.DataFrame(data=np.zeros(len(Temperatures)), index=Temperatures, columns=['Theory'])

# calculate rate constants from the user defined Arrhenius values
def Arrhenius(A, E, T):
    return A*np.exp(-E/(R*T))

# calculate the rate constants for each user defined temperature
k1f = [Arrhenius(A1f, E1f, T+273.) for T in Temperatures]
k1r = [Arrhenius(A1r, E1r, T+273.) for T in Temperatures]
k2 = [Arrhenius(A2, E2, T+273.) for T in Temperatures]
k3 = [Arrhenius(A3, E3, T+273.) for T in Temperatures]

# definition of the derivative of the concentrations vector variable. 
def deriv(y, t, k1f, k1r, k2, k3):
    P, I, D = y
    dPdt = k2*I + k3*D
    dIdt = (-k1r-k2)*I + k1f*D
    dDdt = k1r*I + (-k1f-k3)*D
    dydt = [dPdt, dIdt, dDdt]
    return dydt

# initial conditions (user defined)
yo = [Po, Io, Do]

# call solver for each user defined temperature
for i, T in enumerate(Temperatures):
    yt = odeint(deriv, yo, t, args=(k1f[i], k1r[i], k2[i], k3[i]))
    for j, C in enumerate(['P', 'I', 'D']):
        concentrations[T,C] = yt[:,j]

# find the impurity concentration at the user defined time points 
index = (np.array(days)/float(ndays)*npts).astype('int')
for i, T in enumerate(Temperatures):
    for j, d in enumerate(days[i]):
        results.loc[T,d].P = concentrations[T,'P'].iloc[index[i,j]]
        
# find the time that the reaction will produce the user-defined impurity concentration
for T in Temperatures:
    predictions.loc[T].Theory = t[np.argmin(np.abs(concentrations[T,"P"] - impurity_setpoint))]/(3600*24*365)

Below are the calculated results for the user-defined kinetics


In [4]:
t_days = t / (24*3600)
max_days_index = np.argmin(np.abs(t_days - (np.max(days)+1)))
t_days = t_days[:max_days_index]
c = concentrations.iloc[:max_days_index]
max_conc = c.iloc[:, c.columns.get_level_values(1)=='P'].max().max()

fig = plt.figure(figsize=(14,10))
for i, T in enumerate(Temperatures):
    ax = fig.add_subplot(2,2,i+1)
    ax.plot(t_days, c[T,'P'], color='red', label='P')
    ax.plot(days[i], results.loc[T].P, ls="none", marker='o', color="red")
    ax2 = ax.twinx()
    ax2.plot(t_days, c[T,'I'], color='green', label='I')
    ax2.plot(t_days, c[T,'D'], color='blue', label='D')
    ax.set_ylim([0, max_conc])
    ax2.set_ylim([0,1])
    ax.set_xlabel("time (days)")
    ax.set_ylabel("impurity concentration (%)")
    ax2.set_ylabel("intermediate and drug concentration (%) ")
    ax.legend(['P'], loc="upper left")
    ax2.legend(loc="upper right")
    ax.set_title("%s C"%(Temperatures[i]))


Concentration profiles: The solid lines are the exact concentrations of the components $P$, $I$, $D$ as a function of time calculated from the defined reaction parameters. The filled symbols denote the experimental time points (defined by the user in the parameter list).


In [5]:
results


Out[5]:
P
Temperature days
25 0 0.000000
7 0.000147
14 0.000295
40 0 0.000000
5 0.000538
10 0.001074
60 0 0.000000
3 0.002224
7 0.005179
80 0 0.000000
1 0.004148
5 0.019836

Theoretical concentration measurements: The above tabulates the theoretical impurity concentrations, $P$, at the measurement time points.


In [6]:
t_days = t / (24*3600)
max_conc = concentrations.iloc[:, concentrations.columns.get_level_values(1)=='P'].max().max()

fig = plt.figure(figsize=(14,10))
for i, T in enumerate(Temperatures):
    ax = fig.add_subplot(2,2,i+1)
    ax.plot(t_days, concentrations[T,'P'], color='red', label='P')
    ax.plot(t_days, [impurity_setpoint]*t_days.shape[0], color='red', ls=':')
    ax.plot(days[i], results.loc[T].P, ls="none", marker='o', color="red")
    ax.set_ylim([0, max_conc])
    ax.set_xlabel("time (days)")
    ax.set_ylabel(r"impurity concentration (%)")
    ax.legend(['P'], loc="upper left")
    ax.set_title("%s C"%(Temperatures[i]))
    ax2 = ax.twinx()
    ax2.plot(t_days, concentrations[T,'I'], color='green', label='I')
    ax2.plot(t_days, concentrations[T,'D'], color='blue', label='D')
    ax2.set_ylim([0,1])
    #ax2.set_ylabel(r"drug concentration")
    ax2.legend(loc="upper right")


Concentration profiles: The plots above show the concentration profiles over a duration sufficient that the impurity concentrations reaches the user defined set point (denoted by the red dashed line).


In [7]:
predictions


Out[7]:
Theory
25 8.006247
40 1.556317
60 0.220225
80 0.038844

Shelf life: The above tabulates the theoretical shelf life in years calculated from the user defined kinetic parameters.

Shelf life prediction assuming zero order kinetics

The actual reaction mechanism is unknown in advance, thus a simple reaction mechanism is assumed. Assuming the reaction kinetics are modeled by a zeroth order on the reactive drug product: \begin{eqnarray} r_P \equiv \frac{dP}{dt} = A \exp(-\frac{E}{RT}) \\ P = A \exp(-\frac{E}{RT})t \\ \ln \left ( \frac{P}{t} \right ) = \ln A - \frac{E}{R} \frac{1}{T} \end{eqnarray} The parameters, $A$ and $E$ can readily be obtained by a plot of $\ln \left ( \frac{P}{t} \right ) \equiv \ln (k_P)$ versus $\frac{1}{T}$.
The time is then calculated as: \begin{equation} t = \frac{P}{\hat{A} \exp(-\frac{\hat{E}}{RT})} \end{equation}


In [8]:
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
r = results.iloc[results.index.get_level_values(1)!=0].copy()
r['y'] = np.log(np.divide(r['P'].values, r.index.get_level_values(1).astype('f8').values*24*3600+1))
r['x'] = 1./(r.index.get_level_values(0).astype('f8').values + 273)
ax.plot(r.x, r.y, ls='none', marker='o')
ax.set_xlabel(r'$1/T$')
ax.set_ylabel(r'$\ln (P/t)$')

reg = sm.ols(formula="y ~ x", data=r).fit()
pred = reg.params[1]*r.x + reg.params[0]
ax.plot(r.x, pred)

A = np.exp(reg.params[0])
E = -1*reg.params[1]*R

print "regression fit for A = %s "%(A)
print "regression fit for E = %s"%(E)


regression fit for A = 113235.62698 
regression fit for E = 19.9829194279

Regression analysis: The above figure plots the logarithm of the measured impurity concentration divided by time, $\ln (P/t)$, against the reciprical temperature, $1/T$ (filled circles), which would be linear for a zero-order kinetics. The line is the result of a linear fit to the data.


In [9]:
print reg.summary()


                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 1.464e+05
Date:                Tue, 12 Apr 2016   Prob (F-statistic):           2.15e-14
Time:                        17:49:41   Log-Likelihood:                 23.586
No. Observations:                   8   AIC:                            -43.17
Df Residuals:                       6   BIC:                            -43.01
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     11.6372      0.082    142.596      0.000      11.438      11.837
x          -1.006e+04     26.301   -382.585      0.000   -1.01e+04   -9998.144
==============================================================================
Omnibus:                        5.017   Durbin-Watson:                   1.772
Prob(Omnibus):                  0.081   Jarque-Bera (JB):                1.518
Skew:                          -1.050   Prob(JB):                        0.468
Kurtosis:                       3.379   Cond. No.                     5.08e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.08e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
/usr/lib/python2.7/dist-packages/scipy/stats/stats.py:1293: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=8
  int(n))

The above regression provides the kinetic parameters necessary to calculate the predicted shelf-life assuming the mechanism is zeroth-order.


In [10]:
temp = R*(predictions.index.astype('f8').values+273)
k = A * np.exp(-E/temp)
predictions['zero_order'] = impurity_setpoint / k * 1/(3600*24*365)

In [11]:
predictions


Out[11]:
Theory zero_order
25 8.006247 6.470045
40 1.556317 1.282699
60 0.220225 0.186024
80 0.038844 0.033576

Table of predicted results The above table compares the theoretical shelf-life in years (calculated from the exact reaction mechanism and user defined parameters) to the predicted shelf-life assuming zero order kinetics.

Shelf life prediction assuming first-order kinetics

Assuming a reaction mechanism as: In this prediction model, the impurity production rate is assumed to be first-order in the total drug concentratjion. \begin{equation} r_P \equiv \frac{dP}{dt} = k D_T \end{equation} where $D_T$ is the total drug concentration, $P$ is the impurity concentration and $k$ is the kinetic parameters, which is assumed to have an Arrehnius temperature dependence. \begin{equation} k = A \exp \left ( -\frac{E}{RT} \right ) \end{equation} The impurity concentration is linked to the drug concentration by a mass balance \begin{equation} D = (D_o + I_o) - P = D_{To} - P \end{equation} where $D_o$ is the starting crystalline drug concentration, $I_o$ is the starting amorphous drug concentration, and $D_{To}$ is the starting total drug concentration. Substituting, \begin{equation} \frac{dP}{dt} = k [D_{To} - P] \end{equation} The first-order ODE can be solved with the integrating factor. \begin{equation} \mu(t) = e^{\int k dt} = e^{kt} \end{equation} \begin{equation} \frac{d}{dt} \left [P e^{kt} \right ] = e^{kt} kD_{To} \end{equation} Integrating \begin{equation} \left [P e^{kt} \right ] = \int_0^t e^{kt} kD_{To} = D_{To} e^{kt} - D_{To} = D_{To} (e^{kt} - 1) \end{equation} Rearranging: \begin{equation} P = D_{To} (1 - e^{-k t}) = D_{To} \left [ 1 - \exp \left (\exp \left (- A \frac{E}{RT} t \right ) \right ) \right ] \end{equation} The above equation links the experimental data (impurity concentration, $P$, at time points, $t$) to the reaction kinetic parameters (i.e. Arrhenius parameters $A$ and $E$).

One method to obtain the Arrhenius parameters is to a stepwise calculation. First, the impurity concentration experiments conducted at the same temperature, $T$, the kinetic parameter, $k$, at the temperature $T$ can be obtained by plotting log of the drug concentration over time (recall that $P = D_{To} - D$). \begin{eqnarray} D_{To} - D = D_{To} (1-e^{-k t}) \\ D = D_{To} e^{-k t} \\ \ln D = \ln D_{To} - k t \\ \ln \left ( \frac{D}{D_{To}} \right ) = - k t \end{eqnarray}


In [12]:
D = Do + Io - results['P']
data = np.log(D/(Do+Io)).to_frame('ln(D/Do)')
data['t'] = data.index.get_level_values(1)*24*3600
data['k'] = [0]*data['t'].shape[0]

colors = ['blue','green','red','orange']
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
for i, T in enumerate(results.index.levels[0]):
    ax.plot(data.loc[T]['t'], data.loc[T]['ln(D/Do)'], ls='none', marker='o', color=colors[i])
    slope = sm.OLS(data.loc[T]['ln(D/Do)'], data.loc[T]['t']).fit().params[0]
    data.loc[(T,slice(None)),'k'] = -1 * slope
    ax.plot(data.loc[T]['t'], data.loc[T]['t']*slope, 
             ls='--', color=colors[i], label="%.2e 1/s"%(-1*slope))
ax.set_ylabel(r'ln $\left ( D/D_{To} \right )$')
ax.set_xlabel(r'$t$ (sec)')
plt.legend(loc='best')


Out[12]:
<matplotlib.legend.Legend at 0xaaed45ac>

Figure $\ln \left ( \frac{D}{D_o} \right ) ~\mathrm{vs}~t$: to estimate the the rate constant at each experimental temperature, which is subsequently used to estimate the activation energy and collision factor for initial guess in the non-linear regression.


In [13]:
data


Out[13]:
ln(D/Do) t k
Temperature days
25 0 0.000000 0 2.438335e-10
7 -0.000147 604800 2.438335e-10
14 -0.000295 1209600 2.438335e-10
40 0 0.000000 0 1.243739e-09
5 -0.000538 432000 1.243739e-09
10 -0.001074 864000 1.243739e-09
60 0 0.000000 0 8.585526e-09
3 -0.002227 259200 8.585526e-09
7 -0.005192 604800 8.585526e-09
80 0 0.000000 0 4.644401e-08
1 -0.004156 86400 4.644401e-08
5 -0.020035 432000 4.644401e-08

Table of the kinetic parameter, $k$ regressed from data at the different experimental temperatures.

Once the kinetic parameter, $k$, is estimated at the different temperatures, the Arrenius parameters, $A$ and $E$, can be obtained by plotting the logarithm of the rate constant at each experimental temperature against the reciprical temperature. \begin{equation} \ln k = \ln A - \frac{E}{R} \frac{1}{T} \end{equation}


In [14]:
df = pd.DataFrame({'x': 1./(data.index.levels[0]+273).values, 'y': np.log(data.loc[(slice(None),0),'k'])})

reg = sm.ols(formula='y~x', data=df).fit()
A = np.exp(reg.params['Intercept'])
E = -1 * R * reg.params['x']

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.plot(df.x, df.y, marker='o', ls='none', color='blue', label='data')
ax.plot(df.x, df.x*reg.params['x'] + reg.params['Intercept'], ls="--", color='blue', label='fit')
ax.set_xlabel(r"$1/T$ (1/K)")
ax.set_ylabel(r"log $k$")
ax.legend()

print "regression fit for A = %.2e"%A
print "regression fit for E = %.2f"%E


regression fit for A = 1.06e+05
regression fit for E = 19.94

Figure $\ln k ~\mathrm{vs}~\frac{1}{T}$: to estimate the activation energy and the collision factor to use as initial guesses in the subsequent nonlinear regression.

The estimated Arrhenius parameters can now be used to estimate shelf-life.


In [15]:
temp = R*(predictions.index.astype('f8').values+273)
k = A * np.exp(-E/temp)
predictions['first_order'] = impurity_setpoint / k * 1/(3600*24*365)
predictions


Out[15]:
Theory zero_order first_order
25 8.006247 6.470045 6.450609
40 1.556317 1.282699 1.282839
60 0.220225 0.186024 0.186738
80 0.038844 0.033576 0.033817

According to a paper by Fung (Statistical prediction of drug stability based on nonlinear parameter estimation, Fung et al Journal Pharm Sci Vol 73 No 5 pg 657-662 1984), improved confidence can be obtained by performing a non-linear regression of all the data simuntaneously, instead of the stepwise approach. In Fung's paper, they pre-assumed a shelf-life defined by 10% degredation of the product, which simplifies the estimation. However, the regression done here will be performed without the assumption.

\begin{equation} D = D_{To} e^{-k t} \end{equation}\begin{equation} \ln D = \ln D_{To} - kt \end{equation}\begin{equation} \ln \left ( \frac{D}{D_{To}} \right ) = - A \exp \left ( -\frac{E}{RT} \right ) t \end{equation}\begin{equation} \left ( \frac{\ln \left ( \frac{D}{D_{To}} \right )}{t} \right ) + A \exp \left ( -\frac{E}{RT} \right ) = 0 \end{equation}


Thus, the Arrhenius parameters are found by finding the roots to the above equation, where $A$ and $E$ are the variables.


In [16]:
D = Do + Io - results['P'].values
df = pd.DataFrame(np.log(D/(Do+Io)), columns=['D*'])
df['t'] = data.index.get_level_values(1)*24*3600
df['T'] = data.index.get_level_values(0) + 273.
df = df.loc[df['t']!=0]
def error(p):
    A = p[0]
    E = p[1]
    return np.sum((df['D*']/df['t'] + A * np.exp(-1. * E/(R*df['T'])))**2)

The optimization will be performed over a range of initial conditions to inspect convergence problems.


In [17]:
xo = np.array([A,E])
opt = []
for d in np.arange(.1,1.5,.1):
    opt.append(minimize(error, xo*d, tol=1e-30))

Plot the square errors for a series of initial conditions.


In [43]:
e2 = np.array([error(r.x) for r in opt])

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")

A, E = (opt[e2.argmin()]).x
print "optimal A = %.2e 1/s" %A
print "optimal E = %.2f kcal/mol K" %E


optimal A = 9.58e+04 1/s
optimal E = 19.87 kcal/mol K

Figure of square error indicates that the objective function likely has some very shallow gradients near the minimum, thus inhibiting convergence to a unique value.


In [44]:
temp = R*(predictions.index.astype('f8').values+273)
k = A * np.exp(-E/temp)
predictions['first_order_nonlinear'] = impurity_setpoint / k * 1/(3600*24*365)
predictions


Out[44]:
Theory zero_order first_order first_order_nonlinear
25 8.006247 6.470045 6.450609 6.270671
40 1.556317 1.282699 1.282839 1.255067
60 0.220225 0.186024 0.186738 0.184097
80 0.038844 0.033576 0.033817 0.033565

Table of precition comparison suggests that zero_order was actually the most accurate of the models. It should be noted that the convergence of the non-linear regression is questionable. The under prediction of shelf-life is to be expected, because of the faster reaction of the amorphous form that gets consumed early in the process.